This example notebook shows the steps towards constructing an automated machine learning pipeline for crop type identification in an area of interest in Austria. Along the pipeline, two different approaches are applied and compared. The first one, the LightGBM, represents a state-of-the-art machine learning algorithm. The second one is a Temporal Convolutional Neural Network architecture from the field of deep learning. The prediction is performed on a time-series of Sentinel-2 scenes from 2018. The example notebook will lead you through the whole process of creating the pipeline, with details provided at each step (see Overview).
Enjoying the functionality of eo-learn and the simplicity of this example workflow is preceded by the unavoidable setup of an adequate working environment. But trust us, it's worth it! And we'll take you by the hand.
To run the example you'll need a Sentinel Hub account. If you do not have one yet, you can create a free trial account at the Sentinel Hub webpage. If you are a researcher you can even apply for a free non-commercial account at the ESA OSEO page.
Once you have the account set up, login to Sentinel Hub Configurator. By default you will already have the default configuration with an instance ID (alpha numeric code of length 36). For this tutorial we recommend that you create a new configuration (Add new configuration) and set the configuration to be based on Python scripts template. Such configuration will already contain all layers used in a more general Land Use/ Land Cover (LULC) example which are adopted for this example. Otherwise you will have to define the layers for your configuration yourself.
One layer you have to define yourself is your "MY-S2-L2A-BANDS" layer. Therefore you:
Configuration Utility and access your newly created LULC configuration+ Add new layer MY-S2-L2A-BANDSreturn [B02,B03,B04,B05,B06,B07,B08,B8A,B11,B12] into the Custom script editor and click </> Set Custom ScriptSave your own layerAfter you have prepared the configuration please put configuration's instance ID into sentinelhub package's configuration file following the configuration instructions.
The Sentinel Hub Python package allows users to make OGC (WMS and WCS) web requests to download and process satellite images within your Python scripts. It supports Sentinel-2 L1C and L2A, Sentinel-1, Landsat 8, MODIS and DEM data source.
Between the acquisition of a satellite image and actionable information, there is a large processing effort. eo-learn as a collection of modular Python sub-packages allows easy and quick pro-cessing of spatio-temporal data to prototype, build and automate these required large scale EO workflows for AOIs of any size. It also directly enables the application of state-of-the-art tools for computer vision, machine learning and deep learning packages in Python to the data. Especially for non-experts to the field of remote sensing and machine learning it makes extraction of valuable information from satellite imagery easier and more comfortable.
With the help of the eo-learn library, the entire classification process can be executed in 4 processing blocks, i.e. EOWorkflows.
In more detail the notebook is structured as follows:
I. Imports
II. Configurations
Now, after the setup you are curious what is next and can't wait to get your hands dirty? Well, let's get started!
Lets start with some necessary imports.
# set module directory to system path
import sys, os
MAIN_FOLDER = os.getcwd()
import_path = os.path.join(MAIN_FOLDER, 'Tasks')
if import_path not in sys.path:
sys.path.append(import_path)
# Built-in modules
import math
import shutil
import itertools
from datetime import timedelta
# Basics of Python data handling and visualization
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from shapely.geometry import Polygon
from tqdm.notebook import tqdm
# Imports from eo-learn, sentinelhub-py, and perceptive-sentinel
from sentinelhub import CRS, BBoxSplitter, MimeType
from eolearn.core import LinearWorkflow, FeatureType, SaveTask, OverwritePermission, LoadTask
from eolearn.core import EOPatch, EOTask, CreateEOPatchTask, ZipFeatureTask, MapFeatureTask
from eolearn.geometry import VectorToRaster, ErosionTask
from eolearn.io import S2L2AWCSInput, AddSen2CorClassificationFeature, ExportToTiff
from eolearn.mask import get_s2_pixel_cloud_detector, AddCloudMaskTask, AddValidDataMaskTask
from eolearn.features import SimpleFilterTask, LinearInterpolation
from eolearn.features import NormalizedDifferenceIndexTask, EuclideanNormTask
# Machine learning
import lightgbm as lgb
import joblib
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn import metrics
# Deep Learning
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.utils import to_categorical
# Notebook specific classes and functions
from CropTypeClassTasks import CleanLPIS, GroupLPIS, LPISCLASS, ConcatenateData, SamplingTaskTask
from CropTypeClassTasks import train_test_split_eopatches, train_test_split_eopatch
from CropTypeClassTasks import plot_confusion_matrix, PredictPatch, AddAreaRatio, FixLPIS, masking
from CropTypeClassTasks import AddGeopediaVectorFeature, Sen2CorValidData, ValidDataFractionPredicate
In this part you can define your configurations. The basic configurations are set for an example running smoothly.
# define in- and output folders
output_path = os.path.join(MAIN_FOLDER, 'Output')
general_data_path = os.path.join(MAIN_FOLDER, 'GeneralData')
patch_path = os.path.join(MAIN_FOLDER, 'Output', 'EOPatches')
thresLPIS_path = os.path.join(MAIN_FOLDER, 'Output', 'EOPatches_Low_LPIS_Thres')
samples_path = os.path.join(MAIN_FOLDER, 'Output', 'Samples')
models_path = os.path.join(MAIN_FOLDER, 'Output', 'Models')
predictions_path = os.path.join(MAIN_FOLDER, 'Output', 'Predictions')
# For reference colormap
lpisclass_cmap = mpl.colors.ListedColormap([entry.color for entry in LPISCLASS])
lpisclass_norm = mpl.colors.BoundaryNorm(np.arange(-0.5, 26, 1), lpisclass_cmap.N)
class_names = [entry.class_name for entry in LPISCLASS]
class_ids = [entry.id for entry in LPISCLASS]
### 1. BBox-Splitter
## Plot AOI and give extent
INPUT_FILE = os.path.join(general_data_path, 'Area_AOI.geojson') # Geojson or Shapefile of the area of interest
austria = os.path.join(general_data_path, 'Area_Austria.geojson') # Geojson of austrian national borders
crs = CRS.UTM_33N # wanted coordinate System of the AOI
### 2. Add ground truth data
## Create EOPatches and add LPIS + area ratio
year = '2018' # year of interest
layerID_dict = {'2018': 2647, '2017': 2034, '2016': 2033} # Layer IDs of Geopedia Layer
layerID = layerID_dict[year] # Layer ID for Austria of year set by "year"
patch_list = os.listdir(patch_path) # List of created EOPatches names
### 3. Add EO data
## Choose EO features
maxcloud = 0.8 # maximum cloudcoverage of sentinel tiles used for download
datafrac = 0.7 # keep only frames with valid data fraction over x%
## Clean EOPatch list
lpis_thres = 0.13 # Patches with less than x% of LPIS coverage are excluded in the following
## Add EO data
time_interval = [f'{year}-01-01', f'{year}-09-30'] # the start and end date for downloading satellite data
### 4. Feature and label engineering
## Feature concatenation, interpolation and LPIS data preparation
day_range = 8 # the realtime range of valid satellite images is resampled to a x day equidistant range
## Prepare LPIS data
grouping_id = 'basic' # define grouping id (has to be identical to endings of the two grouping files)
# File that contains LPIS Crop ID and wanted groups - Colums of shape: CROP_ID, english, slovenian, latin, GROUP_1
lpis_to_group_file = os.path.join(general_data_path, 'at_lpis_{}_crop_to_group_mapping_{}.csv'.format(year, grouping_id))
# File that contains the wanted groups and their ID - Colums of shape: GROUP_1, GROUP_1_ID
crop_group_file = os.path.join(general_data_path, 'crop_group_1_definition_{}.csv'.format(grouping_id))
### 5. Sampling
## Sampling per EOPatch
pixel_thres = 1000 # Pixel thresold necessary for a class to be considered in sampling
samp_class = 500 # take x samples per class per EOPatch
## Combine samples and split into train and test data
test_ratio = 4 # take every xth patch for testing
features_dict = 'FEATURES_SAMPLED' # name of the dictionary where the sample features are saved
labels_dict = 'LPIS_class_{}_ERODED_SAMPLED'.format(grouping_id) # name of the dictionary where the sample labels are saved
Spotlight on your "INPUT_FILE" configuration! This is where you have the possibility to easily adapt the workflow to your needs. Take your pick and replace the AOI file in the General data folder. Either shapefile or geojson formatted version of your AOI is split into smaller patches by eo-learn. The total number of patches depends on the AOIs size. Automated splitting is supposed to create patches of size 10 x 10 km.
# define Area of interest
aoi = gpd.read_file(INPUT_FILE) # read AOI file
aoi_shape = aoi.geometry.values[-1] # get aoi shape
aoi = aoi.to_crs(crs={'init': CRS.ogc_string(crs)}) # assign coordinate system to
# define BBox-Splitter split values
ShapeVal_a = round(aoi_shape.bounds[2] - aoi_shape.bounds[0])
ShapeVal_b = round(aoi_shape.bounds[3] - aoi_shape.bounds[1])
SplitVal_a = max(1, int(ShapeVal_a/1e4))
SplitVal_b = max(1, int(ShapeVal_b/1e4))
# Give extent of AOI + grid count and plot AOI
print('The extent of the AOI is {}m x {}m, so it is split into a grid of {} x {}.'.format(ShapeVal_a,
ShapeVal_b,
SplitVal_a,
SplitVal_b))
aoi.plot()
plt.axis('off');
The simple patch polygons are transformed into bounding boxes suitable for serving as geometrical EOPatch frame.
# split area of interest into an appropriate number of BBoxes
bbox_splitter = BBoxSplitter([aoi_shape], crs, (SplitVal_a, SplitVal_b))
bbox_list = np.array(bbox_splitter.get_bbox_list()) # get list of BBox geometries
info_list = np.array(bbox_splitter.get_info_list()) # get list of x (column) and y(row) indices
print('Each bounding box also has some info how it was created.\nExample:\n'
'bbox: {}\ninfo: {}\n'.format(bbox_list[0].__repr__(), info_list[0]))
First visualize the GeoDataFrame of the bounding boxes
# create GeoDataFrame of BBoxes
geometry = [Polygon(bbox.get_polygon()) for bbox in bbox_list] # get geometry from bbox_list for creating GeoSeries
idxs_x = [info['index_x'] for info in info_list] # get column index for naming EOPatch
idxs_y = [info['index_y'] for info in info_list] # get row index for naming EOPatch
gdf = gpd.GeoDataFrame({'index_x': idxs_x, 'index_y': idxs_y},
crs={'init': CRS.ogc_string(crs)},
geometry=geometry)
shapefile_name = os.path.join(output_path, 'BBoxes.shp')
gdf.to_file(shapefile_name)
gdf.head()
Second visualize the split AOI with reference to Austrian national borders
# Plot AOI overview
austria_gdf = gpd.read_file(austria)
fontdict = {'family': 'monospace', 'weight': 'normal', 'size': 11}
# if bboxes have all same size, estimate offset
xl, yl, xu, yu = gdf.geometry[0].bounds
xoff, yoff = (xu - xl) / 3, (yu - yl) / 5
# main figure
fig, ax = plt.subplots(figsize=(20, 20))
gdf.plot(ax=ax,facecolor='w',edgecolor='r',alpha=0.5,linewidth=5)
aoi.plot(ax=ax, facecolor='w',edgecolor='k',alpha=0.5)
austria_gdf.plot(ax=ax, facecolor='w',edgecolor='b',alpha=0.5)
ax.set_title('Test Area Splitted');
plt.axis('off')
# sub figure
a = plt.axes([0.2, 0.6, .2, .2])
gdf.plot(ax=a, facecolor='w',edgecolor='r',alpha=0.5, linewidth=3)
aoi.plot(ax=a, facecolor='w',edgecolor='k',alpha=0.5, linewidth=3)
plt.xticks([])
plt.yticks([])
Now it's time to create EOPatches and start filling them with data.
EOPatches. You can then fill these handy data containers endlessly. EOPatch. The importance of this ratio will become apparent in the following steps.An EOPatch is created and manipulated using EOTasks. Due to the potentially large number of EOPatches, automation of the processing pipeline is absolutely crucial. Therefore EOTasks are chained in an EOWorkflow. In this example the final workflow is executed on all patches, which are saved to the specified directory.
The EOTasks need to be put in some order and executed one by one. This can be achieved by manually executing the tasks, or more conveniently, defining an EOWorkflow which does this for you.
An EOWorkflow can be linear or more complex, but it should be acyclic. Here we will use the linear case of the EOWorkflow, available as LinearWorkflow
# TASK FOR CREATING EOPATCH
create = CreateEOPatchTask()
# TASK FOR ADDING LPIS DATA FROM GEOPEDIA
# here you can choose the year of interest
# also you have to set the corresponding Geopedialayer-ID
add_lpis = AddGeopediaVectorFeature((FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year)),
layer=layerID, year_filter=None, drop_duplicates=True)
# TASK FOR ADDING AN AREA RATIO
# the area ratio indicates the EOPatches proportion of LPIS coverage
area_ratio = AddAreaRatio((FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year)),
(FeatureType.SCALAR_TIMELESS, 'FIELD_AREA_RATIO'))
# TASK FOR SAVING TO OUTPUT
save = SaveTask(patch_path, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)
# define the workflow
workflow = LinearWorkflow(create,
add_lpis,
area_ratio,
save)
# execute workflow
pbar = tqdm(total=len(bbox_list))
for idx, bbox in enumerate(bbox_list):
bbox = bbox_splitter.bbox_list[idx]
info = bbox_splitter.info_list[idx]
patch_name = f'eopatch_{idx}_col-{info["index_x"]}_row-{info["index_y"]}'
workflow.execute({create:{'bbox':bbox}, save:{'eopatch_folder':patch_name}})
pbar.update(1)
Visualize the added vector data for one example EOPatch
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name))
# plot vector data
print('Plotting LPIS vector data of eopatch: {}'.format(eopatch_name))
fig, ax = plt.subplots(figsize=(20, 20))
LPIS = eopatch.vector_timeless['LPIS_{}'.format(year)]
LPIS.plot(column='SNAR_BEZEI', axes=ax, categorical=True)
ax.set_aspect('auto')
ax.set_xticks(ticks=[])
ax.set_yticks(ticks=[])
del eopatch
As you can see, the crop types in your AOI are very diverse. Each colour stands for one of the over 200 LPIS classes.
Now, it's time to add Sentinel-2 data to the EOPatches. You are lucky to be using eo-learn, as this is simply done by setting up another single EOWorkflow including only a single EOTask for adding your satellite images. The remaining tasks allow you to create extensive valid data masks and useful indices using a ridiculously small amount of code.
In detail you add:
A mask of validity, based on acquired data from Sentinel and cloud coverage.
# TASK TO LOAD EXISTING EOPATCH
load = LoadTask(patch_path)
# TASK TO ADD SENTINEL 2 LEVEL 2A DATA
# Here also a simple filter of cloudy scenes is done. A detailed cloud cover
# detection is performed within the next steps
add_l2a = S2L2AWCSInput(layer='MY-S2-L2A-BANDS',
resx='10m',
resy='10m',
maxcc=maxcloud,
time_difference=timedelta(hours=2),
raise_download_errors=False)
# TASK TO ADD SEN2COR'S SCENE
# add sen2cor's scene classification map and snow probability map
add_scl = AddSen2CorClassificationFeature(sen2cor_classification='SCL',
layer='TRUE-COLOR-S2-L2A',
image_format=MimeType.TIFF_d32f,
raise_download_errors=False)
# TASK FOR CLOUD INFO
# cloud detection is performed at 160m resolution
# and the resulting cloud probability map and mask
# are scaled to EOPatch's resolution
cloud_classifier = get_s2_pixel_cloud_detector(average_over=2, dilation_size=1, all_bands=False)
add_clm = AddCloudMaskTask(cloud_classifier,
'BANDS-S2CLOUDLESS',
cm_size_y='160m',
cm_size_x='160m',
cmask_feature='CLM')
# create valid data masks
scl_valid_classes = [2, 4, 5, 6, 7]
# TASKs FOR ADDING L2A and L1C VALID DATA MASKS
# convert cloudmask to validmask
add_clm_valid =MapFeatureTask((FeatureType.MASK, 'CLM'),
(FeatureType.MASK, 'CLM_VALID'),
np.logical_not)
# combine IS_DATA and CLM_VALID
add_l1c_valmask = ZipFeatureTask({FeatureType.MASK: ['IS_DATA', 'CLM_VALID']},
(FeatureType.MASK, 'L1C_VALID'),
np.logical_and)
# combine IS_DATA and SCL (using an erosion radius of 6 and a dilation radius of 22 pixel for SCL classes)
add_l2a_valmask = AddValidDataMaskTask(Sen2CorValidData(scl_valid_classes, 6, 22), 'L2A_VALID')
# combine all validmasks
add_valmask = ZipFeatureTask({FeatureType.MASK: ['L1C_VALID', 'L2A_VALID']},
(FeatureType.MASK, 'VALID_DATA'),
np.logical_and)
# TASK TO FILTER OUT SCENES INCLUDING TOO MANY UNVALID PIXEL
# keep frames with > x % valid coverage
valid_data_predicate = ValidDataFractionPredicate(datafrac)
filter_task = SimpleFilterTask((FeatureType.MASK, 'VALID_DATA'), valid_data_predicate)
# TASK FOR CALCULATING INDICES
# NDVI = Normalized Difference Vegetation Index
# NDWI = Normalized Difference Water Index
# NORM = Euclidean Norm
ndvi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'MY-S2-L2A-BANDS'),
(FeatureType.DATA, 'NDVI'),
[6, 2])
ndwi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'MY-S2-L2A-BANDS'),
(FeatureType.DATA, 'NDWI'),
[1, 6])
norm = EuclideanNormTask((FeatureType.DATA, 'MY-S2-L2A-BANDS'), (FeatureType.DATA, 'NORM'))
# TASK FOR SAVING TO OUTPUT
save = SaveTask(patch_path, compress_level=1, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)
workflow = LinearWorkflow(load,
add_l2a,
add_scl,
add_clm,
add_clm_valid,
add_l1c_valmask,
add_l2a_valmask,
add_valmask,
filter_task,
ndvi,
ndwi,
norm,
save)
Most likely, along with this innovative workflow, you are pushing humankind forward with other processes on your machine. Therefore you do not want to waste your ressources on EOPatches containing very little agricultural area. Before running your already set up EOWorkflow, clean your EOPatch list.
Remember the earlier calculated LPIS ratio? From here on you only keep EOPatches containing more than 13% agricultural area. The irrelevant ones are moved to the sidetrack. If you want to use EOPatches more extensively covered with agricultural area simply increase your "lpis_thres" configuration.
# in GeoDataFrame label patches with certain thresold either as to do (1) or not to do (0)
gdf[f'far{year}'] = -2.0
for idx, row in gdf.iterrows():
patch_name = os.path.join(patch_path, f'eopatch_{idx}_col-{row.index_x}_row-{row.index_y}')
eop = EOPatch.load(str(patch_name), lazy_loading=True)
gdf.loc[idx, f'far{year}'] = eop.scalar_timeless['FIELD_AREA_RATIO'][0]
gdf[f'todo{year}'] = (gdf[f'far{year}'] > lpis_thres) * 1
gdf.to_file(shapefile_name)
# move EOPatch folders with LPIS coverage beneath thresold into seperate folder
move = []
patch_list_delete = gpd.read_file(shapefile_name)
patch_list_delete = patch_list_delete[patch_list_delete[f'todo{year}'] == 0] # identify EOPatches with insufficient LPIS thresold
# create list including names of the identified EOPatches
for idx in patch_list_delete.index:
info = bbox_splitter.info_list[idx]
patch_name = f'eopatch_{idx}_col-{info["index_x"]}_row-{info["index_y"]}'
move.append(patch_name)
print('EOPatches moved to sidetrack: ' + str([patch_name for patch_name in move]))
# move identified EOPatches to alternative folder
for patch_name in move:
shutil.move(os.path.join(patch_path, patch_name), os.path.join(thresLPIS_path, patch_name))
patch_list = os.listdir(patch_path) # update patch_list
Then go ahead and run your EOWorkflow on the basis of your "time_interval" configuration!
# execute workflow and save the names of those that failed
failed = []
pbar = tqdm(total=len(patch_list))
for patch_name in patch_list:
# add EO data if possible
try:
workflow.execute({load: {'eopatch_folder': patch_name},
add_l2a: {'time_interval': time_interval},
save: {'eopatch_folder': patch_name}})
# append EOPatch name to list for further investigation
except Exception as ex:
print(f'Failed {patch_name} with {ex}')
failed.append(patch_name)
pbar.update()
The classifier you are using for the following prediction is very picky when it comes to the format of the input data. To feed your thoughtfully compiled data to the algorithm it needs some preparation.
Now, after all necessary data is added let's load a single EOPatch and look at the structure. By executing
EOPatch.load(os.path.join(patch_path, 'eopatch_0_col-0_row-0')
You obtain the following structure:
EOPatch(
data: {
MY-S2-L2A-BANDS: numpy.ndarray(shape=(39, 1028, 1033, 10), dtype=float32)
NDVI: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=float32)
NDWI: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=float32)
NORM: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=float32)
}
mask: {
CLM: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
CLM_VALID: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
IS_DATA: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
L1C_VALID: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
L2A_VALID: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
SCL: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=int32)
VALID_DATA: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
}
scalar: {}
label: {}
vector: {}
data_timeless: {}
mask_timeless: {}
scalar_timeless: {
FIELD_AREA_RATIO: numpy.ndarray(shape=(1,), dtype=float64)
}
label_timeless: {}
vector_timeless: {
LPIS_2018: geopandas.GeoDataFrame(columns=['geometry', 'FS_KENNUNG', 'SL_FLAECHE', 'ID', 'SNAR_BEZEI', 'DateImported'], length=4091, crs=EPSG:32633)
}
meta_info: {
maxcc: 0.8
service_type: 'wcs'
size_x: '10m'
size_y: '10m'
time_difference: datetime.timedelta(seconds=7200)
time_interval: ['2018-01-01', '2018-09-30']
}
bbox: BBox(((420862.3179607267, 5329537.336315366), (431194.28800678457, 5339817.792378783)), crs=CRS('32633'))
timestamp: [datetime.datetime(2018, 1, 6, 10, 4, 51), ..., datetime.datetime(2018, 9, 28, 10, 0, 24)], length=39
)
As you can see your EO data and indices are stored in data.FeatureType your valid data masks in mask.FeatureType and your ground truth data in vector_timeless.FeatureType
It is possible to access various EOPatch content via calls like:
eopatch.timestamp
eopatch.vector_timeless['LPIS_2018']
eopatch.data['NDVI'][0]
eopatch.data['MY-S2-L2A-BANDS'][5][..., [3, 2, 1]]
In order to get a quick and realistic overview of your AOI you plot the true color image of one EOPatch
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
fig, ax = plt.subplots(figsize=(20, 20))
plt.imshow(np.clip(eopatch.data['MY-S2-L2A-BANDS'][0][..., [2, 1, 0]] * 3.5, 0, 1))
plt.xticks([])
plt.yticks([])
ax.set_aspect('auto')
del eopatch
Plot the time-wise mean of NDVI for the whole region. Filter out clouds in the mean calculation.
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
fig, ax = plt.subplots(figsize=(20, 20))
ndvi = eopatch.data['NDVI']
mask = eopatch.mask['VALID_DATA']
ndvi[~mask] = np.nan
ndvi_mean = np.nanmean(ndvi, axis=0).squeeze()
im = ax.imshow(ndvi_mean, vmin=0, vmax=0.8, cmap=plt.get_cmap('YlGn'))
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect('auto')
cb = fig.colorbar(im, ax=ax, orientation='horizontal', pad=0.01, aspect=100)
cb.ax.tick_params(labelsize=20)
plt.show()
del eopatch
To see how the valid data masks look like and work together, you can compare them to a regular RGB image. For demonstration reasons a timeframe is selected which contains cloud-covered area.
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
tidx = 1
plt.figure(figsize=(20,20))
plt.subplot(331)
plt.imshow(np.clip(eopatch.data['MY-S2-L2A-BANDS'][tidx][..., [2,1,0]] * 3.5,0,1))
plt.xticks([])
plt.yticks([])
plt.title('MY-S2-L2A-BANDS - RGB')
plt.subplot(332)
plt.imshow(eopatch.mask['IS_DATA'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('IS_DATA - Data availability')
plt.subplot(333)
plt.imshow(eopatch.mask['CLM'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('CLM - Cloudmask')
plt.subplot(334)
plt.imshow(eopatch.mask['L1C_VALID'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('L1C_VALID - L1C valid data mask')
plt.subplot(335)
plt.imshow(eopatch.mask['L2A_VALID'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('L2A_VALID - L2A valid data mask')
plt.subplot(336)
plt.imshow(eopatch.mask['SCL'][tidx].squeeze(), cmap='jet')
plt.xticks([])
plt.yticks([])
plt.title('SCL - Sen2Cor scene classification map')
plt.subplot(338)
plt.imshow(eopatch.mask['VALID_DATA'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('VALID_DATA - Combined valid data mask')
del eopatch
As you can see invalid pixel from the different cloud masks and Sen2Cor scene classification map are combined. For SCL the classes:
are considered as invalid.
Plot the mean of NDVI over all pixels in a single patch throughout the year. Filter out clouds in the mean calculation.
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
ndvi_series = eopatch.data['NDVI']
time = np.array(eopatch.timestamp)
mask = eopatch.mask['VALID_DATA']
t, w, h, _ = ndvi_series.shape
ndvi_clean = ndvi_series.copy()
ndvi_clean[~mask] = np.nan # set values of invalid pixels to NaN's
# Calculate means, remove NaN's from means
ndvi_mean = np.nanmean(ndvi_series.reshape(t, w * h).squeeze(), axis=1)
ndvi_mean_clean = np.nanmean(ndvi_clean.reshape(t, w * h).squeeze(), axis=1)
time_clean = time[~np.isnan(ndvi_mean_clean)]
ndvi_mean_clean = ndvi_mean_clean[~np.isnan(ndvi_mean_clean)]
fig, ax = plt.subplots(figsize=(20, 5))
plt.plot(time_clean, ndvi_mean_clean, 's-', label = 'Mean NDVI with cloud cleaning')
plt.plot(time, ndvi_mean, 'o-', label='Mean NDVI without cloud cleaning')
plt.xlabel('Time', fontsize=15)
plt.ylabel('Mean NDVI over patch', fontsize=15)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(loc=2, prop={'size': 15});
ax.set_aspect('auto')
del eopatch
The time series displayed looks very fragmented for the temporal resolution of the Sentinel 2 data to be so hyped, right?
This is what you get if you choose to keep timeframes with valid data fraction over 70% only. You set the value in your "datafrac" configuration. If you expect a nice overview of vegetation growing stages, reality kicks in and gives you mostly cloudy conditions in the first months of the year. The good thing about being picky about the validity of your timeframes is reduced data volume. Invalid frames contain no additional value for your later analysis anyways.
By a spatial sampling of the EOPatches you randomly take a subset of pixels from a patch to use in the machine learning training and testing. Here you only want to consider classes that are represented to a certain quantity of pixels.
# for linear interpolation find earliest and latest overlapping dates
# list EOPatches
eopatches = []
patch_list = os.listdir(patch_path)
for i in patch_list:
eopatches.append(EOPatch.load(os.path.join(patch_path, i), lazy_loading=True))
eopatches = np.array(eopatches)
# identify earliest date
timelist = []
for eopatch in eopatches:
timelist.append(eopatch.timestamp[0])
mindate = str(max(timelist).date())
print('Earliest date: ' + str(max(timelist)))
# identify latest date
timelist = []
for eopatch in eopatches:
timelist.append(eopatch.timestamp[-1])
maxdate = str(min(timelist).date())
print('Latest date: ' + str(min(timelist)))
# TASK FOR LOADING EXISTING EOPATCHES
load = LoadTask(patch_path)
# TASK FOR CONCATENATION
# bands and indices are concatenated into one features dictionary
concatenate = ConcatenateData('FEATURES', ['MY-S2-L2A-BANDS','NDVI','NDWI','NORM'])
# TASK FOR LINEAR INTERPOLATION
# linear interpolation of full time-series and date resampling
resample_range = (mindate, maxdate, day_range)
linear_interp = LinearInterpolation(
'FEATURES', # name of field to interpolate
mask_feature=(FeatureType.MASK, 'VALID_DATA'), # mask to be used in interpolation
copy_features=[(FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year))], # features to keep
resample_range=resample_range, # set the resampling range
bounds_error=False # extrapolate with NaN's
)
# TASK TO FIX AUSTRIAN LPIS DATA
# on the basis of the wrongly defined column "SNAR_BEZEI"
# a column "SNAR_BEZEI_NAME" is added which defines the LPIS class
fixlpis = FixLPIS(feature='LPIS_{}'.format(year), country='Austria')
# TASK FOR GROUPING LPIS INTO WANTED CLASSES
# on the basis of the two grouping files an individual crop type grouping can be applied
# for changes these files have to be adapted
grouplpis = GroupLPIS(year=year, lpis_to_group_file=lpis_to_group_file, crop_group_file=crop_group_file)
# TASK FOR CONVERTING LPIS DATA FROM VECTOR TO RASTER FORMAT
# multiple rasterized layers appling different crop type groupings can be stored in an EOPatch
vtr = VectorToRaster(
vector_input=(FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year)),
raster_feature=(FeatureType.MASK_TIMELESS, 'LPIS_class_{}'.format(grouping_id)),
values_column='GROUP_1_ID',
raster_shape=(FeatureType.DATA, 'FEATURES'),
no_data_value=0)
# TASK FOR EROSION
# erode each class of the reference map
erosion = ErosionTask(mask_feature=(FeatureType.MASK_TIMELESS,'LPIS_class_{}'.format(grouping_id),
'LPIS_class_{}_ERODED'.format(grouping_id)), disk_radius=1)
# TASK FOR SPATIAL SAMPLING
spatial_sampling = SamplingTaskTask(grouping_id, pixel_thres, samp_class)
# TASK FOR SAVING TO OUTPUT
save = SaveTask(patch_path, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)
# define the workflow
workflow = LinearWorkflow(load,
concatenate,
linear_interp,
fixlpis,
grouplpis,
vtr,
erosion,
spatial_sampling,
save)
pbar = tqdm(total=len(patch_list))
for patch_name in patch_list:
extra_param = {load: {'eopatch_folder': patch_name},
grouplpis: {'col_cropN_lpis': 'SNAR_BEZEI_NAME',
'col_cropN_lpistogroup': 'CROP_ID'},
save: {'eopatch_folder': patch_name}}
workflow.execute(extra_param)
pbar.update(1)
Now, after all the data is transformed and sampled let's load the single EOPatch again and look at the structure. By executing
EOPatch.load(os.path.join(patch_path, 'eopatch_0_col-0_row-0')
You obtain the following structure:
EOPatch(
data: {
FEATURES: numpy.ndarray(shape=(31, 1033, 1040, 13), dtype=float64)
FEATURES_SAMPLED: numpy.ndarray(shape=(31, 6000, 1, 13), dtype=float64)
}
mask: {}
scalar: {}
label: {}
vector: {}
data_timeless: {}
mask_timeless: {
LPIS_class_basic: numpy.ndarray(shape=(1033, 1040, 1), dtype=uint8)
LPIS_class_basic_ERODED: numpy.ndarray(shape=(1033, 1040, 1), dtype=uint8)
LPIS_class_basic_ERODED_SAMPLED: numpy.ndarray(shape=(6000, 1, 1), dtype=uint8)
}
scalar_timeless: {}
label_timeless: {}
vector_timeless: {
LPIS_2018: geopandas.GeoDataFrame(columns=['geometry', 'FS_KENNUNG', 'SL_FLAECHE', 'ID', 'SNAR_BEZEI', 'DateImported', 'SNAR_BEZEI_NAME', 'CROP_ID', 'english', 'slovenian', 'latin', 'GROUP_1', 'GROUP_1_original', 'GROUP_1_ID'], length=4140, crs=epsg:32633)
}
meta_info: {}
bbox: BBox(((420717.14926283853, 5329441.919254168), (431121.7036578405, 5339770.083848184)), crs=EPSG:32633)
timestamp: [datetime.datetime(2018, 1, 29, 0, 0), ..., datetime.datetime(2018, 9, 26, 0, 0)], length=31
)
Things have changed, haven't they?
Your 10 spectral bands and 3 indices are combined in FEATURES and the randomly sampled pixels are stored in FEATURES_SAMPLED. After filtering, your valid data masks have been deleted and your eroded and sampled reference data is available in practical raster format as mask_timeless.FeatureType.
As you performed the spatial sampling for each patch separately you have to combine the samples. But first you have to assign your EOPatches either to the training or validation dataset. In this case you take one in four EOPatches for testing.
Only classes present in both train and test dataset are considered in the classification.
The sampled features and labels are loaded and reshaped into $n \times m$, where $n$ represents the number of training pixels, and $m = f \times t$ the number of all features, with $f$ the size of bands and band combinations (in this example 13) and $t$ the length of the resampled time-series (in this example 34)
Terminology: In data science features are commonly refered to as "X" and labels as "y"
patch_list = os.listdir(patch_path) # update patch list
# combine EOPatches to one dataset
eopatches = []
for i in patch_list:
eopatches.append(EOPatch.load(os.path.join(patch_path, i), lazy_loading=True))
eopatches = np.array(eopatches)
# depending on the number of EOPatches adjust test_ratio if necessary and split into test and train data accordingly
if len(patch_list) == 1:
# split combined dataset into train and test data
X_train, X_test, y_train, y_test, n_timesteps, n_features = train_test_split_eopatch(eopatches,
features_dict,
labels_dict)
elif len(patch_list) < 4:
test_ratio = 3
# split combined dataset into train and test data
X_train, X_test, y_train, y_test, n_timesteps, n_features = train_test_split_eopatches(eopatches,
test_ratio,
features_dict,
labels_dict)
else:
# split combined dataset into train and test data
X_train, X_test, y_train, y_test, n_timesteps, n_features = train_test_split_eopatches(eopatches,
test_ratio,
features_dict,
labels_dict)
# mask out labels that are not in both train and test data and also mask out samples where features include NaN values
X_train, X_test, y_train, y_test = masking(X_train, X_test, y_train, y_test)
total_samp_count = X_train.shape[0] + X_test.shape[0]
print('From your {} EOPatch(es) at total of {} samples were taken. '
'This sampling dataset includes {} training and {} test samples.'.format(len(patch_list),
total_samp_count,
X_train.shape[0],
X_test.shape[0]))
fig = plt.figure(figsize=(20, 15))
y_ids_train, y_counts_train = np.unique(y_train, return_counts=True)
plt.subplot(2, 1, 1)
plt.bar(range(len(y_ids_train)), y_counts_train)
plt.xticks(range(len(y_ids_train)), [class_names[i] for i in y_ids_train], rotation=90, fontsize=20);
plt.yticks(fontsize=20)
plt.grid(True)
plt.title('Training samples', size=20)
y_ids_test, y_counts_test = np.unique(y_test, return_counts=True)
plt.subplot(2, 1, 2)
plt.bar(range(len(y_ids_test)), y_counts_test)
plt.xticks(range(len(y_ids_test)), [class_names[i] for i in y_ids_test], rotation=90, fontsize=20);
plt.yticks(fontsize=20)
plt.grid(True)
plt.title('Test samples', size=20)
fig.subplots_adjust(wspace=0, hspace=1)
As you can see you have managed to generate a well balanced dataset. In both your 3/4 training and 1/4 test dataset no group is under or over represented, which provides a reasonable basis for the following classification.
In the following you want to feed your samples into two different algorithms. To guarantee equivalent conditions for both models you need scaled features and one-hot-endcoded labels.
# scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train = np.reshape(X_train, (-1,n_timesteps,n_features))
X_test = np.reshape(X_test, (-1,n_timesteps,n_features))
# save feature related scaling properties
joblib.dump(scaler, os.path.join(samples_path,
'Scaler_{}.bin'.format(grouping_id)),
compress=True)
# labels one- hot-encoding
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
enc = OneHotEncoder(sparse=False)
enc.fit(np.array(class_ids).reshape(-1, 1))
y_train = enc.transform(y_train)
y_test = enc.transform(y_test)
label_count = y_train.shape[1]
You can choose to save your samples for later applications. For entering the upcoming part of prediction, this is not necessary.
# save samples(optional)
np.save(os.path.join(samples_path, 'X_train_{}'.format(grouping_id)), X_train)
np.save(os.path.join(samples_path, 'X_test_{}'.format(grouping_id)), X_test)
np.save(os.path.join(samples_path, 'y_train_{}'.format(grouping_id)), y_train)
np.save(os.path.join(samples_path, 'y_test_{}'.format(grouping_id)), y_test)
# load samples(optional)
X_train = np.load(os.path.join(samples_path, 'X_train_{}.npy'.format(grouping_id)))
X_test = np.load(os.path.join(samples_path, 'X_test_{}.npy'.format(grouping_id)))
y_train = np.load(os.path.join(samples_path, 'y_train_{}.npy'.format(grouping_id)))
y_test = np.load(os.path.join(samples_path, 'y_test_{}.npy'.format(grouping_id)))
Congrats, you've mastered the heavy preprocessing steps! Now, this is where the magic of Machine and Deep Learning happens.
State-of-the-art LightGBM is used as a ML model. It is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms, used for many ML tasks.
As novel competitors, TempCNN DL architectures are entering the game. So far Convolutional Neural Networks were mainly and successfully applied for image and language recognition tasks. Modifying the convolutional filters of the architectures the Temporal CNN is supposed to exploit the temporal information of satellite image time series.
The default hyper-parameters are used in this example. For more info on parameter tuning, check the documentation of the package.
%%time
# Set up training classes
rev_y_train = [np.argmax(y, axis=None, out=None) for y in y_train]
rev_y_train_unique = np.unique(rev_y_train)
# reshape features from count-timeframes-features to timeframes-count-features
a, b, c = X_train.shape
X_train_lgbm = X_train.reshape(a,b * c)
# Set up the LightGBM model
model_lgbm = lgb.LGBMClassifier(
objective='multiclass',
num_class=len(rev_y_train_unique),
metric='multi_logloss'
)
# Train the model
model_lgbm.fit(X_train_lgbm, rev_y_train)
# Save the model
joblib.dump(model_lgbm, os.path.join(models_path, 'model_lgbm_CropTypeClass_{}.pkl'.format(grouping_id)))
In this example an approved architecture from the scientific paper linked above is adopted.
%%time
# Set up the TempCNN architecture
model_tcnn = Sequential()
model_tcnn.add(Conv1D(filters=5, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model_tcnn.add(Dropout(0.5))
model_tcnn.add(Conv1D(filters=5, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model_tcnn.add(Dropout(0.5))
model_tcnn.add(Conv1D(filters=5, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model_tcnn.add(Dropout(0.5))
model_tcnn.add(Flatten())
model_tcnn.add(Dense(256, activation='relu'))
model_tcnn.add(Dense(label_count, activation='softmax'))
model_tcnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model_tcnn.fit(X_train,
y_train,
validation_data=(X_test, y_test),
epochs=10,
batch_size=32,
verbose=1,
shuffle=True)
# Save the model
joblib.dump(model_tcnn, os.path.join(models_path, 'model_tcnn_CropTypeClass_{}.pkl'.format(grouping_id)))
Validation of the models is a crucial step in data science. All models are wrong, but some are less wrong than others, so model evaluation is important.
In order to validate the models, we use the training set to predict the classes, and then compare the predicted set of labels to the "ground truth".
The validation is performed by evaluating various metrics, such as accuracy, precision, recall, $F_1$ score, some of which are nicely described in this blog post
Get the overall accuracy (OA) and the weighted $F_1$ score
# reshape features from count-timeframes-features to timeframes-count-features
# and set up training classes
d, e, f = X_test.shape
X_test_lgbm = X_test.reshape(d, e * f)
rev_y_test = [np.argmax(y, axis=None, out=None) for y in y_test]
# Load the models
model_lgbm = joblib.load(os.path.join(models_path, 'model_lgbm_CropTypeClass_{}.pkl'.format(grouping_id)))
model_tcnn = joblib.load(os.path.join(models_path, 'model_tcnn_CropTypeClass_{}.pkl'.format(grouping_id)))
# get overall accuracy and weighted F1-score for LightGBM
py_test_lgbm = model_lgbm.predict(X_test_lgbm)
print('Classification accuracy LightGBM {:.1f}%'.format(100 * metrics.accuracy_score(rev_y_test, py_test_lgbm)))
print('Classification F1-score LightGBM {:.1f}%'.format(100 * metrics.f1_score(rev_y_test, py_test_lgbm, average='weighted')))
py_test_tcnn = model_tcnn.predict_classes(X_test)
print('Classification accuracy TempCNN {:.1f}%'.format(100 * metrics.accuracy_score(rev_y_test, py_test_tcnn)))
print('Classification F1-score TempCNN {:.1f}%'.format(100 * metrics.f1_score(rev_y_test, py_test_tcnn, average='weighted')))
$F_1$ score, precision, and recall for each class separately
# LightGBM: F1-score, precision, and recall for each class separately
class_labels = np.unique(rev_y_test)
class_names = [entry.class_name for entry in LPISCLASS]
f1_scores = metrics.f1_score(rev_y_test, py_test_lgbm, labels=class_labels, average=None)
recall = metrics.recall_score(rev_y_test, py_test_lgbm, labels=class_labels, average=None)
precision = metrics.precision_score(rev_y_test, py_test_lgbm, labels=class_labels, average=None)
print('LightGBM:')
print(' Class = F1 | Recall | Precision')
print(' --------------------------------------------------')
for idx, croptype in enumerate([class_names[idx] for idx in class_labels]):
print(' * {0:20s} = {1:2.1f} | {2:2.1f} | {3:2.1f}'.format(croptype,
f1_scores[idx] * 100,
recall[idx] * 100,
precision[idx] * 100))
# TempCNN: F1-score, precision, and recall for each class separately
class_names = [entry.class_name for entry in LPISCLASS]
f1_scores = metrics.f1_score(rev_y_test, py_test_tcnn, labels=class_labels, average=None)
recall = metrics.recall_score(rev_y_test, py_test_tcnn, labels=class_labels, average=None)
precision = metrics.precision_score(rev_y_test, py_test_tcnn, labels=class_labels, average=None)
print('TempCNN:')
print(' Class = F1 | Recall | Precision')
print(' --------------------------------------------------')
for idx, croptype in enumerate([class_names[idx] for idx in class_labels]):
print(' * {0:20s} = {1:2.1f} | {2:2.1f} | {3:2.1f}'.format(croptype,
f1_scores[idx] * 100,
recall[idx] * 100,
precision[idx] * 100))
fig = plt.figure(figsize=(20, 20))
conf_matrix_gbm = metrics.confusion_matrix(rev_y_test, py_test_lgbm)
plot_confusion_matrix(conf_matrix_gbm,
classes=[name for idx, name in enumerate(class_names) if idx in class_labels],
normalize=True,
ylabel='Truth (CROPS)',
xlabel='Predicted (LightGBM)',
title='Confusion matrix');
fig = plt.figure(figsize=(20, 20))
conf_matrix_gbm = metrics.confusion_matrix(rev_y_test, py_test_tcnn)
plot_confusion_matrix(conf_matrix_gbm,
classes=[name for idx, name in enumerate(class_names) if idx in class_labels],
normalize=True,
ylabel='Truth (CROPS)',
xlabel='Predicted (TempCNN)',
title='Confusion matrix');
The validation of the models shows that for most of the groups both perform very well. However, there seem to be differences in their confusion for certain classes:
The LightGBM model contains the information about feature importances. Let's check which features are most important for classification.
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
timeframe_count = eopatch.data['FEATURES'].shape[0]
features_count = eopatch.data['FEATURES'].shape[3]
del eopatch
z = model_lgbm.feature_importances_.reshape((timeframe_count, features_count))
fnames = ['B02','B03','B04','B05','B06','B07','B08','B8A','B11','B12','NDVI','NDWI','NORM']
fig = plt.figure(figsize=(15, 15))
ax = plt.gca()
# plot the importances
im = ax.imshow(z, aspect=0.25)
plt.xticks(range(len(fnames)), fnames, rotation=45, fontsize=20)
plt.yticks(range(timeframe_count), ['T{}'.format(i) for i in range(timeframe_count)], fontsize=20)
plt.xlabel('Bands and band related features', fontsize=20)
plt.ylabel('Time frames', fontsize=15)
plt.ylim(top=-0.5, bottom=timeframe_count - 0.5)
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
fig.subplots_adjust(wspace=0, hspace=0)
cb = fig.colorbar(im, ax=[ax], orientation='horizontal', pad=0.01, aspect=100)
cb.ax.tick_params(labelsize=20)
cb.set_label('Feature importance', fontsize=15)
As you can see, the most important features for LightGBM are recorded within the main growth period. Here different growing stages can be detected that constitute certain crop types.
Now that both models have been validated, the remaining thing is to predict the whole AOI. As LightGBM receives higher overall accurays it is used for further predictions. If you are interested in a specific crop group TempCNN is outperforming LightGBM simply change the following configuration.
# swap commentation for using a different model
# model = joblib.load(os.path.join(models_path, 'model_tcnn_CropTypeClass_{}.pkl'.format(grouping_id))) # load TempCNN model
model = joblib.load(os.path.join(models_path, 'model_lgbm_CropTypeClass_{}.pkl'.format(grouping_id))) # load LightGBM model
# load respective feature scaler
scaler = joblib.load(os.path.join(samples_path, 'Scaler_{}.bin'.format(grouping_id)))
In the following you define a workflow to make a prediction on the existing EOPatches. The EOTask accepts the features and the names for the labels. In addition you export GeoTIFF images of the prediction to easily access your visual results.
# TASK TO LOAD EXISTING EOPATCHES
load = LoadTask(patch_path)
# TASK FOR PREDICTION
predict = PredictPatch(model, (FeatureType.DATA, 'FEATURES'), 'LBL_GBM', scaler)
# TASK TO EXPORT TIFF
export_tiff = ExportToTiff((FeatureType.MASK_TIMELESS, 'LBL_GBM'))
tiff_location = predictions_path
if not os.path.isdir(tiff_location):
os.makedirs(tiff_location)
# TASK FOR SAVING
save = SaveTask(patch_path, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)
workflow = LinearWorkflow(load,
predict,
export_tiff,
save)
patch_list = os.listdir(patch_path) # update patch list
# execute workflow
pbar = tqdm(total=len(patch_list))
for patch_name in patch_list:
extra_param = {load: {'eopatch_folder': patch_name},
export_tiff: {'filename': '{}/prediction_{}.tiff'.format(predictions_path, patch_name)},
save: {'eopatch_folder': patch_name}}
workflow.execute(extra_param)
pbar.update()
Finishing the last processing step, let's have a look at the final EOPatch by executing
EOPatch.load(os.path.join(patch_path, 'eopatch_0_col-0_row-0')
You obtain the following structure which is extended by your predicted data stored as LBL_GBM in mask_timeless.FeatureType:
EOPatch(
data: {
FEATURES: numpy.ndarray(shape=(34, 1028, 1033, 13), dtype=float64)
FEATURES_SAMPLED: numpy.ndarray(shape=(34, 6000, 1, 13), dtype=float64)
}
mask: {}
scalar: {}
label: {}
vector: {}
data_timeless: {}
mask_timeless: {
LBL_GBM: numpy.ndarray(shape=(1028, 1033, 1), dtype=int64)
LPIS_class_basic: numpy.ndarray(shape=(1028, 1033, 1), dtype=uint8)
LPIS_class_basic_ERODED: numpy.ndarray(shape=(1028, 1033, 1), dtype=uint8)
LPIS_class_basic_ERODED_SAMPLED: numpy.ndarray(shape=(6000, 1, 1), dtype=uint8)
}
scalar_timeless: {}
label_timeless: {}
vector_timeless: {
LPIS_2018: geopandas.GeoDataFrame(columns=['geometry', 'FS_KENNUNG', 'SL_FLAECHE', 'ID', 'SNAR_BEZEI', 'DateImported', 'SNAR_BEZEI_NAME', 'CROP_ID', 'english', 'slovenian', 'latin', 'GROUP_1', 'GROUP_1_original', 'GROUP_1_ID'], length=4091, crs=EPSG:32633)
}
meta_info: {}
bbox: BBox(((420862.3179607267, 5329537.336315366), (431194.28800678457, 5339817.792378783)), crs=CRS('32633'))
timestamp: [datetime.datetime(2018, 1, 6, 0, 0), ..., datetime.datetime(2018, 9, 27, 0, 0)], length=34
)
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
# update colormap
cb_classes = np.unique(np.unique(eopatch.mask_timeless['LBL_GBM']))
custom_cmap = mpl.colors.ListedColormap([lpisclass_cmap.colors[i] for i in cb_classes])
custom_norm = mpl.colors.BoundaryNorm(np.arange(-0.5, len(cb_classes), 1), custom_cmap.N)
# mask prediction - exclude pixel with no LPIS reference
labels = np.array(eopatch.mask_timeless['LPIS_class_{}'.format(grouping_id)])
mask = labels == 0
labelspred = np.array(eopatch.mask_timeless['LBL_GBM'])
LBL = np.ma.masked_array(labelspred, mask)
# plot figure
fig, ax = plt.subplots(figsize=(20, 20))
im = ax.imshow(LBL.squeeze(), cmap=lpisclass_cmap, norm=lpisclass_norm)
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect('auto')
fig.subplots_adjust(wspace=0, hspace=0)
# plot colorbar
cb = fig.colorbar(mpl.cm.ScalarMappable(norm=custom_norm, cmap=custom_cmap),
orientation="horizontal",
pad=0.01,
aspect=100)
cb.ax.tick_params(labelsize=20)
cb.set_ticks(range(len(cb_classes)))
cb.ax.set_xticklabels([class_names[i] for i in cb_classes], rotation=90, fontsize=15)
plt.show()
del eopatch
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
# mask prediction - exclude pixel with no LPIS reference
labels = np.array(eopatch.mask_timeless['LPIS_class_{}'.format(grouping_id)])
mask = labels == 0
labelspred = np.array(eopatch.mask_timeless['LBL_GBM'])
LBL = np.ma.masked_array(labelspred, mask)
fig, axes = plt.subplots(2,2,figsize=(20, 10))
# plot prediction
ax1 = plt.subplot(121)
im = ax1.imshow(LBL.squeeze(), cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.title('Prediction')
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_aspect('auto')
# plot ground truth
ax2 = plt.subplot(122)
im = ax2.imshow(labels.squeeze(), cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.title('Ground truth')
ax2.set_xticks([])
ax2.set_yticks([])
ax2.set_aspect('auto')
axlist=[ax1,ax2]
fig.subplots_adjust(wspace=0, hspace=0)
# plot colorbar
cb = fig.colorbar(mpl.cm.ScalarMappable(norm=custom_norm, cmap=custom_cmap),
ax = axlist,
orientation="horizontal",
pad=0.01,
aspect=100)
cb.ax.tick_params(labelsize=20)
cb.set_ticks(range(len(cb_classes)))
cb.ax.set_xticklabels([class_names[i] for i in cb_classes], rotation=90, fontsize=15)
plt.show()
del eopatch
# Draw the Reference map
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)
# create red-green colormap
colors = [(0, 1, 0), (1, 0, 0)] # G -> R
cmap_name = 'my_list'
cm = LinearSegmentedColormap.from_list(
cmap_name, colors)
fig = plt.figure(figsize=(20, 20))
inspect_size = 100
w, h = labels.squeeze().shape
w_min = np.random.choice(range(w - inspect_size))
h_min = np.random.choice(range(h - inspect_size))
ax = plt.subplot(2, 2, 1)
plt.imshow(labels.squeeze()[w_min: w_min + inspect_size, h_min : h_min + inspect_size],
cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.xticks([])
plt.yticks([])
ax.set_aspect('auto')
plt.title('Ground truth', fontsize=20)
ax = plt.subplot(2, 2, 2)
plt.imshow(LBL.squeeze()[w_min: w_min + inspect_size, h_min: h_min + inspect_size],
cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.xticks([])
plt.yticks([])
ax.set_aspect('auto')
plt.title('Prediction', fontsize=20)
ax = plt.subplot(2, 2, 3)
mask = LBL.squeeze() != labels.squeeze()
plt.imshow(mask[w_min: w_min + inspect_size, h_min: h_min + inspect_size], cmap=cm)
plt.xticks([])
plt.yticks([]);
ax.set_aspect('auto')
plt.title('Difference', fontsize=20)
ax = plt.subplot(2, 2, 4)
image = np.clip(eopatch.data['FEATURES'][8][..., [2, 1, 0]] * 3.5, 0, 1)
plt.imshow(image[w_min: w_min + inspect_size, h_min: h_min + inspect_size])
plt.xticks([])
plt.yticks([]);
ax.set_aspect('auto')
plt.title('True Color', fontsize=20)
fig.subplots_adjust(wspace=0.1, hspace=0.1)
del eopatch
As you can probably see in the randomly chosen section of the AOI there are certain patterns of misclassified pixels:
Now, after your first successful classification you are hooked? But the region around Wels in Austria was surprisingly not your actual AOI or you want to try other vegetation groupings? Then here are some suggestions on how you could proceed:
Customize configurations
The notebook offers various possibilities to change parameters and evaluate their effects. Simply enter the configuration section in the beginning and modify e.g. cloudcover thresholds or your sampling strategy.
Change the AOI within Austria
This would be the simplest case to apply. You just have to place a Shapefile or Geojson of your own AOI in the location of the "Area_AOI.geojson" from the example. The size and shape of the included polygon are irrelevant.
Try alternative crop groupings
In order to regroup the LPIS classes you need to have a closer look at the two CSV files in the GeneralData folder.
at_lpis_2018_crop_to_group_mapping_basic.csv: Here you can assign LPIS classes to different crop groups.\
_CROPID represents the respective LPIS class\
_GROUP1 represents the respective group you want a class incrop_group_1_definition_basic.csv: Here you can combine or separate individual crop groups by assigning the respective ID.\
_GROUP1 again represents the groups\
_GROUP_1ID represents the respective numeric IDApply the notebook to another country
Another country means different AOI plus different LPIS classes.